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ABSTRACT 

Internal evidence of cultural bias, in terms of various ' types of item 
analysis, was sought in the Wonderlic Personnel Test results in large, 
representative samples of whites and Negroes totalling some 1,500 subjects. 
Essentially, the lack of any appreciable Race X Items interaction and the 
high interracial similarity in rank order of item difficulties lead to the 
conclusion that the Wonderlix shows very little or no evidence of cultural 
bias with respect to the present samples, which, however, differ appre- ./ 
^wij ciably in mean scores. The items. which best measure the ^ factor within 

each racial group are, by and large, the same* items that show the largest 
interracial discrimination. 
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An Examination of Culture Bias in the 
Wonderlic Personnel Test 



Arthur Jensen 
University of California, Berkeley 

Psychometricians are generally agreed that a population difference 
in average test score is not, by itself, evidence of biased sampling of test 
items such as to favor (or disfavor) a particular cultural group. The mean 
difference between groups may be explainable in terms of factors other 
than culture bias in the item content of the test. Evidence of culture 
bias thus depends upon criteria other^than a group mean difference. 

There are two main classes of criteria for assessing test bias: 
external and internal. They are complementary. The external criteria are 
the more important in terms of the practical usefulness of the test and 
where predictive validity for a specific quantifiable performance criterion 
is possible. Bias is indicated when two (or more) populations show signi- 
ficantly different regressions of criterion measures on test scores. If 
the regression lines for the two (or more) groups do not differ signifi- 
cantly in intercept and slope, the test can be said to be "fair" to^all 
groups with respect to the given criterion of external validity. Refine- 
ments and variations of this general external cri^.erion for ass^essing test 
bias have been discussed extensively in the measurement literature (e,g,, * 
Cleary, 1968; Darlington, 1971; Humphreys, 1973; Jensen, 1968; Linn, 1973; 
Thorndike, 1971). 
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Internal criteria of cultural bias become important when discussing 
the construct validity of the test and in assessing claims of bias even 
when the external validity criteria give no evidence of bias. Such claims 
of test bias are sometimes made on the grounds that the external criterion 
of the test's validity is itself culture-biased and is therefore predictable 
by a culture-biased test. Internal criteria of bias get around this argu- 
ment by examining the degree to which different socioeconomic and cultural 
groups differ in terms of various "internal" features of the test involving 
item statistics. The main criterion for the detection of bias lies in the 
magnitude of the groups X items interaction relative to ;>ther sources of 
variance in an analysis of variance (ANOVA) design comprised of Groups (g). 
Items (l), Subjects within Groups (S), and the interactions G X I and S x I, 
This method was first used by Cleary and Hilton (1968), who examined the 
G X I interaction on two forms of the Preliminary Scholastic Aptidude Tost 
in white and Negro^groups, The Race X Items interaction proved statistically 
significant but contributed to minimally relative to the main effects that 
the authors concluded: " . • • given the stated definition of bias, the 
PSAT for practical purposes is not biased for the groups studiedr" Stanley 
(1969) later showed that a considerable amount of the Race X Items inter- 
action was due to just a few items that were too difficult in botli racial 
groups and therefore did not discriminate much between them. Negroes scored 
rather uniformly lower than whites on most of the items. 

The Groups X Items interaction is analyzable into two effects: ia) 
the similarity in the rank order of the percent passing, each item in 
each of the groups, and (b^) the similarity between the groups in the differ- 
ences between the £ values of adjacent items in the test, i.e., 2.i'R2^ ^2^^3^ 
etc. There are here called £ decrements. Group differences in rank order 
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of item difficulties are termed disordinal interactions. Group differences 
in 2. decrements, when the rank order of ^ values is the same, in both groups, 
are termed ordinal interactions. A measure of similarity between. groups , 
such as the Pearson correlation between the groups, in j5 values and p decre- 
ments, can serve as sensitive indexes of the degree to which the groups 
behave differently with respect to different items. Presumably all test 
items in any test are not equally culture biased, and to the degree that 
items differ in this property, the extent of cultural differences between 
two groups relevant" to performance on the test should be related inversely 
to the size of the intergroup correlations of values and of j^^ decrejnents. 
Also, if more test items are culturally Irrelevant or unreliable in one 
group than in another, this can be expected to result in different magnitudes 
of the test's internal consistency reliability id the two groups. 

The present study examines the Wonderlic Peprsonnel Test (WPT)- for 
evidence of culture bias in terms of these internal criteria when applied 
to representative white and Negro samples. The WPT is an obviously culture- 
loaded test of general intelligence. The fact that it is culture-loaded 
only means that most of the items are based on specific information and cog- 
nitive skills that are commonly acquired in present-day English-speaking 
western culture. This is obvious simply from inspection of the test items. 
Whether the obvious culture loading of the items biases the test to the dis- 
advantage of any particular population with respect to another population 
is a separate question which can be answered only in terms of empirical 
investigation of test data from the groups in question. 

The cultural- educational loading of the Wonderlic would seem to make 
it suspect as a possibly culture-biased test in the American Negro population. 
This should be a point of concern when the WPT is used in business and 



industry, and especially where precise external criteria of the WPT's 
validity in the white and Negro groups is not availabLe, More than 6,500 
organizations routinely use the WPT as a part of their personnel selection 
and placement procedures, making it one of the most widely used tests of 
mental ability. 

Detailed descriptions of the WPT and references to previous research 
can be found in Euros (l972, pp. 724-6). Briefly, the WPT is a group- 
administered paper~and-pencil test of 50 verbal, numerical, and spatial 
items arranged in spiral ominbus fashion. It is generally given with a 
12-minute lime limit. Alternate form reliabilities average .95. Use of 
the WPT is claimed to have validity where educability or trainability is a' 
job requirement (Wonderlic & Wonder lie, 1972, p. 60). Large representative 
samples of males and females show no significant difference in totril raw 
score on the WPT. 

Negro Norms 

Norms based on 38,452 Negro job applicants have been published (Won- 
derlic & Wonderlic, 1972). The authors state: "The vast amount of data 
studied in this report confirms that a very stable differential in raw scores 
achieved by Negro applicant populations exists.. Where education, sex, age, 
region of country and/or position applied for are held constant, Negro- 
Caucasian WPT score differentials are consistently observed- These mean 
score differentials are . . . about one standard deviation apart when com- 
parisons of Caucasians and Negroes are studied" (p. 3). As the authors 'note 
(p. 68), the Negro (as well as white) norms are based. on biased samples of 
the Negro (and' white) populations to the extent that they are based on an 
applicant population of individuals who are looking for jobs. The age group 
trom 20 to 24 is predominantly represented for both sexes and for both races. 



The published norms show the mean and median test scoi^e o'f Negro and 
white applicants for each of 80 different occupational categories, from the 
prof essional -managerial level to unskilled labor. The correlation between 
the Negro and white medians across the 80 occupational categories is .84 
(the correlation between means is -87), indicating a high degree of similar 1 
between the racial groups in their self-selection for various occupations - 
In other words, the rank order of median and mean test scores of applicants 
for various jobs is very similar in the Negro and white' populations,, despite 
the^ approximately 1 o race difference in mean scores for all job categories- 
Is there internal evidence in the test data that the 1 o difference 
between whites and Negroes is attributable in whole or in part to culture 
bias in the WPT? 

Method 

Subj ects 

Parallel analyses were performed on two pairs of white and Negro 
samples. Thus the findings from the main analyses are replicated in two 
sets of Negro-white comparisons based on samples selected in different ways- 

Sample 1 consists of 544 white and 544 Negro S^s representing a random 
sample of the nationwide population of job applicants on which the published 
white and Negro norms are based for Form IV of the WPT- These large samples 
thus closely approximate the score distributions of the normative white and 
Negro populations, which have been given full statistical description in 
the manual of norms of the WPT (Wonderlic & Wonderlic, 1972)- The samples 
were drawn without selection for characteristics such as age, education, 
job category, sex, and region- All Ss coded as "other minority" or Ss with 
Spanish surnames were exr.luded from the sample. In terms of the white o 

i 
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(standard deviation), the mean scores of the white and Negro samples differ 
by 1.05 cj as compared with 1.00 O in the total normative populations. 

Sample" 2 consists of randomly sleeted test protocols of 204 white 
and 204 Negro Ss who were job applicants for entry level positions in a 
single company in New York City. No selection was made on age, education, 
and sex. S^s coded as '^other julnorities'* and Spanish surnames are not 
included in the white sample. The white and Negro mean^s of Sample 2 are 
very close to the national norms, but the SD s are almost double. ( Sample 2 : 
White X = 22.07, SD = 14.86; Negro x = 15.63, SD = 13.89. National Norms: 
White X = 23,32, SD = 7.50; Negro x = 15.80, SD = 7.06). In terms of thn 
white sample SD, therefore, the Sample 2 white-Negro mean difference is 
only 0.43 a, although it is C.86 O in terms of the normative white. 

Results 

P Values and P Decrements 

The £ value is the proportion of the total sample who answer a given 
test item correctly. _P values were obtained for items 1 - 50 In the white 
and Negro groups. 

The p decrement is the difference between the ^ values of ordinally 
adjacent test items, e.g., £-|^-£2' ■£2'"-^3' ^^^* ' ^here the subscript indicates 
the item number in the test. ^ decrements between adjacent items 1-2, 2-3, 
. • .^49-50 were obtained in both samples. 

Table 1 shows the mean £ values within sets of 10 items (and for all 
items) for each of the racial groups in Samples 1 and 2. The item £ values 



Insert Table 1 about here 
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were correlated between racial groups within 10-item sets and over all 50 
items. As can be seen in Table 1, these correlations are quite high even 
within sets of 10 itemss. This means that the relative difficulty of the 
items, as indicated by the proportion passing, is highly similar in the 
white and Negro samples. 

The reliability of the £ values within each racial group was esti- 
mated by obtaining the correlations between the 2, values of the same racial 
groups in Samples 1 and 2- These within-race corre.lations between £ values 
are all over .90 and for all 50 items the correlations (or reliability of 
the £^ values) are -995 for whites and .992 for Negroes- Using the reli- 
' abilities thus obtained, the interracial correlations between item £ values 
were corrected for attenuation, as shown in Table 1. The fact that the 
correlations after correction for attenuation are distributed about a mean 
of less than 1.00, of course, indicates that the interracial correlation of 
£ values is significantly less than the intraracial correlation. Yet the 
corrected interracial correlations are very high, which means that the 
relative item difficulties, though not identical, are much alike in the 
white and Negro groups. 

The £ decrements were treated in exactly/ the same way- Since ^ 
decrements, unlike jO values, are not systematically correlated with the 
item' s' ordinal position in the test, the interracial correlation between 
2^ decrements is a more sensitive index of group similarity than the corre- 
lation of 2, values. A high interracial correlation between 2. decrements 
means that the relative differences in difficulty between adjacent items 
are much alike in the two racial groups. If some items were more racially- 
culturally biaj5ed than others, resulting in different relative difficulties 
for whites and Negroes, it would be reflected, in a low interracial correlation 

ERIC 
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f 

between item £ decrements, both with or without correction for attenuation. 
As can be seen in Table 1, this is not the case. The interracial correla- 
tions of 2, decrements are remarkably high. They are distributed about a t 
mean of less than 1.00, however, which means that there is a slight but 
significant difference in the relative £^ decrements of the white and Negro 
groups. 

P Values and P Decrements for Attempted Items Only 

As the WPT is a timed test, very few S_s attempt every item. The 
typical pattern of response for most Ss is to answer the first 10 or 15 
items and then to begin to skip around looking for items that appear rela- 
tively easy for them in order to obtain the highest score they possibly 
can in the time available. Items which were left unanswered by the S are 
considered to be not attempted. 

Table 2 shows (a) the mean proportion of each group attempting 

items (iiv sets of 10 items), ih) the interracial correlation (corrected 

f _ , 

for attenuation) between these propor ticois , (c) the mean proportion, P., 
passing the attempted items (dl) the interracial correlations of , and 
(e_) the correlation between proportion attempting and proportion passing 
the items. 



Insert Table 2 about here 



Whites and Negroes are highly similar in the proportions attempting 
each item. The similarity is even greater for the proportion of each group 
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passing the attempted items; the interracial correlations, when corrected 
for attenuation, generally do not differ significantly from unity- Overall, 
in both Samples 1 and 2, the interracial correlations of item difficulties 
of attempted items is so high as to indicate that the items have essentially 
the same lelative difficulties in the white and Negro groups. 

White-Negro Differences According to Type of Items 

It is often claimed that Negroes perform relatively less well on 
verbal items than on other types, since presumably verbal content allows 
wider scope for cultural variations and the effects of bias on Negro s cores - 
To' see if this notion holds true for the various kinds of item content in 
the WPT, items were classified as shown in Table 3 and the mean White-Negro 
difference in these item categories was determined. 



Insert Table 3 about here 



Since items in different categories occur unsys tematically -^X differ- 
ent ordinal positions in the test and have different overall levels of dif- 
ficulty in both racial groups, it was necessary, in order to make the appro- 
priate comparisons, to transform the proportion passing to an index of item 
difficulty which constitutes an interval scale. As explained by Guilford 
(1954, pp. 418-419), this is accomplished by expressing the proportion 
passing in terms of the _z score deviations of the normal curve. The group 
mean difference is thus expressed in a of ^ score deviations. For example, 
if on a given item Group A has 847o passing and Group B has 607o passing, the 
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corresponding _z scores (from the table of areas under the normal curve) are 
+1.00 and +0.25 and the difference between Group A and Group R is 1.00 - .25 = 
. 75 a By thus transforming £ values to ^ scores, items of different diffi- 
culty in the two groups can be compared on an interval scale, permittixig 
direct comparisons of the mean White-Negro z score differences for different 
types of items. 

Table 3 shows the mean scale difference between the white and 
Negro group on the various types of items, as well as the SD of the JN items 
of each type. Because of the small -numbers of items in the separate cate- 
gories, the most important comparisons are between the totals for Verbal, 
Numerical, and Logical Reasoning. Also, more weight probably should be 
given to the results f or attemp ted i tems • In S ampl e 1 there were no indi- 
vidual items with negative values, either for all items or for attempted 
items, and there were only five such items among those attempted in Sample 
2; in all cases these were items attempted by fewer than 87o of either group. 
That is to say, whites did better on all items attempted by more than 87o 
of S^s in either group. There is no regular tendency for the White-Negro 
difference to be greater for the verbal than for numerical or logical rea- 
soning, and the smallest differences are in factual information and the 

■ I 

interpretation of proverbs, which, surprisingly, are the types of items 
that are so often held up as examples of culture-loaded test items. There 
is no consistent difference between ''all items'' and "attempted items." 
Overall the White-Negro difference is about as great for the attempted 
items as for all the items. The rather low degree of consistency between 
results for Samples 1 and 2 would seem to make unwarranted any strong con- 
clusions from the analysis in Table 3. What it does illustrate is the lack 
of any marked or consistent tendency for any one type of item to be more 

er|c 



racially discriiTiinating than other types, as the items are here classified- 

If specific type of content is not systematically related to the 
item*s racial discriminability , is there any item characteristic that is 
so related? It was hypothesized that items* ^ loadings (or loading on 
the first principal component) when the item intercorrelation matrix is 
factor analyzed within each racial group separately would be most higly 
related to the item^s discriminability between the racial groups- That is 
to say, the more highly an item is correlated with the general factor 

common to all items, within either racial group, the more highly it will 
discriminate between the racial groups- To test this hypothesis, the items' 
loadings on the first principal component (the ^ factor of the item inter- 
correlation matrix) were obtained from separate principal components analyses 
of the white and Negro data (Sample 2). The items' factor loadings were 
correlated with the items' index of interracial discriminability (Table 3), 
for all items, not just attempted items- The Pearson correlation is -47 
in the White sample and .62 in the Negro sample. For items with ^ loadings 
of greater than .40, the mean White-Negro^ difference is .64 (for factor 
loadings in White sample) and .67 (for factor loadings in Negro sample); 
while for items with loadings of less than .40, the corresponding dif- 
ferences are .36 and .37, respectively. A similar relationship holds also 
for attempted items. The White-Negro difference for all items with load- 
ings of more than .40 on is .52 (in White sample) and .66 (in the Negro); 
the corresponding figures for items loaded less- than .40 are .35 and .31. 
When this was cross-validated in Sample 1, the White-Negro difference 
for all items with loadings greater than .40 is .78 (for White sample) 
and .79 (for Negro sample); the z^ differences for all items with ^ loadings 
less than .40 are .54 (White sample) and .55 (Negro sample). The cross- 
"alidating correlation between the Sample 2 factor loadings and the Sample 1 
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White-Negro £ differences are .44 (White sample) and .33 (Negro sample). 
What all this means is that there is a substantial relationship between 
the size of the item loadings on the general factor common to all items 
in the Wonderlic and the magnitude of the White-Negro difference on the 
item, and this is true whether the ^ factor is determined in the White or 
in the Negro Sample. Neither the loadings on any components other than 
the first principal component (i.e., ^) nor type of item content reveals 
■any systematic relationship to the item's interracial discriminability. 
On the other hand, the items that best measure the general factor within 
each racial group are the same items, by and large, that discriminate most 
highly between the racial groups. 

Analysis of Variance: Items X Subjects Matrix 

The Race X Items interaction in a complete ANOVA of the Items X 
Subjects matrix provides a sensitive index of item bias relative to other 
sources of variance. Using the Sample 2 data, three such ANOVAs were per- 
formed: (1) on the total white and Negro groups, (2) on white and Negro . 
groups equated on total WPT score, and (3) on "pseudo-racial" groups com- 
prised entirely of two groups of white Ss selected so that their total 
WPT score distributions closely match the normative white and Negro distri- 
butions in means and SDs. The ANOVAs for each of these conditions are sum- 
marized in Table 4. To that the three analyses can be directly compared. 



Insert Table 4 about here 



the sum of squares for each source in the ANOVA is converted to omega 
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Squared (U) ) V 100, which is the percent of the total variance cUtributable 
to the given source. 

For the ANOVA of the total white and Negro sainples, all of tht^ effects 
are significant beyond the .001 level* including the Race :•' Itoins in torncl i on. 
But once the statistical significance of this interaction is sliov/n . • moro 
important than statistical significance is the magnitude of the interaction 
relative to other sources of variance. The smaller it is. the more "f.;rir" 
the test as regards culture bias. The appropriate index of "fairness," 
thus defined, is the A/^ ratio, which, in terms of {jj^ is _A = R/S and B 
(RXI)/(I>'S). In terms of F, A/B = F_/F_ ... The two formulas for the a/B 

— — — — — ^iv** 1 — — 

ratio are algebraically equivalent. If the Race Items interaction is non- 
significant, it is presumed that no bias has been demonstrated and there is 
no point in computing the A/_B ratio. The lower the value of the ^/J3 ratio, 
the easier it would be to equalize or reverse the racial group means by 
item selection. Obviously a small group mean difference along with a large 
Groups X Items interaction would mean that a' somewhat different selection 
of items from the same item population could equalize or reverse 

the group means. The higher the value of A/^, the less is the 
possibility of equalizing the group means through item selection from a 
similar population of items. This would not rule out the possibility of 
introducing different kinds of items into the test- but if doing so decreases 
the A/_B ratio (even though it decreases the group mean difference), it can 
be argued that the minimizing of the group mean difference is simply a 
result of balancing item biases. Some tests equate male and female' scores 
on this basis, balancing items that favor one sex with the selectio'?T^of 
items that favor the other. Such a test, resulting in little or no mean 
sex difference but a large Sex X Items interaction, of course precludes the 

ERLC 
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use of such a test for studying the question of sex differences in tlie 
ability which the test purports to measure. The same tiling v;ould bo true 
of any test which was made to equallxe racial group differences at t\]o ex- 
pense of greatly increasing the Race > Ltems interaction- Tlie desirable 
condition is to minimize the interaction as much as possible. 

The A/_B ratio for the total samples (Table 4) is 10.84. For com- 
parison, a similar study of white and Negro elementary pupils shov,'ed an 
j\/j3 ratio of 7.10 on the culture-loaded Peabody Picture Vocabulary Test 
and of 17- 32 on the cul ture- reduced Raven's Progressive Matrices (Jensen, 
in press)- 

ANQVA on Equated i^ite and Negro Samples - in a pr ev i ous s tud y . i t 
was found that when groups of white and Negro school children were rouglily 
matched for mental age (rather than chronological age)^ and ANOVA of the 
Peabody Picture Vocabulary Test (PPVT) items was [)erformed, the Race 
Items interaction was greatly reduced from its magnitude when tlie two racial 
groups were of the same chronological age but different mental ages (Jensen, 
in press). This finding suggests that a large part of the Race >■ Items 
interaction is attributable .to a mental maturity items interaction rather 
than to a racial-cultural difference per se- And this hypothesis was 
strengthened by showing that the same magnitude of the actual Race '•■ Items 
interaction could be achieved entirely with the white sample, simply by 
dividing it into two "pseudo-racial" groups for the ANOVA- One group of 
white Ss was selected so that their distribution of total PPVT scores 
matched the Negro distribution in mean and S^] the other group of white Ss 
was selected so that its PPVT score distribution matched the total white 
distribution. When these two culturally homogeneous groups, corresponding 
to the Negro and white samples, were subjected to the same ANOVA as was 

ERLC 
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applied to the true racial groups, it reproduced the same results almost 
perfectly, including the Race X Items interaction. In other words, an 
interaction of this magnitude could be attributed to an average ability 
difference between the groups rather than to a cultural difference- 

The same kind of analysis is here applied to the Wonderlic data. 
Since mental ag,e is not a meaningful scale in an adult population, Negro 
and white _Ss wer'e simply matched for total score on the VJPT. Perfect 
matching was possible on 127 White-Negro pairs, making the white and Negro 
total score distributions identical. 

If the WPT items are culture-biased for Negroes, one might expect 
that whites and Negroes with the same total scores would obtain them in 
different says, so that even when the main effect of Race is zero in the 

ANOVA, the Race X Items interaction would remain. 

•J 

Table 4 shows the results of the ANOVA on the equated samples. The 
main effect of race was, of course, forced to be zero by equating the groups. 



Insert' Table 4 about here 



But note that the Race X Items interaction is very small and nonsignificant 
(F = 1.25, d^ = 48/12,096, £> .10). This finding is consistent with the 
hypothesis that the R X J[ interaction in the ANOVA of the total samples is 
due to the average difference in ability between the groups rather than to 
a cultural difference^ It seems less likely that equating the white and 
Negro groups for total score should wipe out. an R. X ^ interaction if it 
truly reflected a cultural difference between the white and Negro groups, 
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One might argue that white and Negro S_s who attain the same total 
socre must be highly similar in cultural background and therefore would 
show no significant R > I interaction. But are they culturally more simi- 
lar than individuals of the sajiie racial group who dirfer by 7 poinLs in 
total Wonderlic score? (The o of total scores in the normative white |)opu- 
lation is close to 7.) Siblings reared together i.n the same frimily difi'er 
by almost as much. Since the white and Negro population means difi'er by 
close to 1 a ( or 7 points on the WPT) , we can do an ANOVA on a "osoiido- 
race*' comparison by making up two groups of white S^s selected so that Llieir 
score distributions closely approximate those of Negroes and whiles. ihis 
was done by ranking all white scores from highest to lowest, and liien. 
working in from both ends of the distribution, selecting pairs of Ss who 

differ by exactly 7 points in total score. The means of the two dlsLri- 
7 

buttons differ by^, points and they have the same = 12, 78, 

Table 4 shows the ANOVA of these "pseudo-race" groups. It can be 

seen that the results re5emble the true racial comparison (Table 4--Total 

Samples), especially as regards the' ^ X _I_ interaction, which for the Total 

Samples constitutes l,047o of the variance and for the "pseudo-racial" 

samples is 0.947o. The _F for the R X I interaction is. significant beyond 

the .001 level for both the Total Sample and Pseudo-Race Sample, and the 

2 

A/B ratios are 10. 84 and 16, 31, respectively. The ratio' ofiO for the 
interactions (^ 1. / £s X I_) is ,019 in both the Total Sample and the 
"Pseudo-Race" Sample, All this indicates that a large part of the R I, 
interaction can be attributed to a level- of • abi 1 i ty y items interaction, 
since it is shown to exist in the "pseudo-race" groups which are both com- 
prised of white S^s differing in average ability. If the significant 
jR X r interaction were explainable only in terms of cultural differences 
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between the white and Negro groups, it seems highly Improbably that It 
could be reduced to nonsignif icance simply by equating the racial grouj^s 
for overall level of ability, or that the same significant Interaction 
could be produced within a culturally homogeneous white sample divided 
into high and low ability groups with overlapping score distributions 
similar to the total white and Negro distributions. In brief, from thcisc 
three ANOVAs shown in Table 4, it would be extremely difficult lo make a 
case that the Race X It»ems interaction is attributable to cultural bias. 
These analyses should have produced .markedly different results if the 
popular claims of culture bias were in fact valid. 

Discussion and Conclusion 

Several different analyses of test item characteristics have failed 
to reveal evidence of culture bias for large Negro and white samples on 
the Wonderlic Personnel Test. If some items were more culture biased than 
others with respect to the cultural backgrounds of Negroes and whites, one 
should expect (a) significantly different rank order of values (percent 
passing) for various items In the white and Negro samples, (_b)slgni f lean tly 
different intervals ( 1 . e- , ^^^^^"^^^^^ ^ between the values of adjacent 
test items in white and Negro samples, (_c) a significant Race >' Items inter- 
action in the analysis of variance of the Race X Items x Subjects score 
matrix, even when both racial groups are equated for total score, and (d) 
systematic differences in the types of item content that discriminate most 
and least between the white and Negro samples. None of these expectations 
was borne out by the present data. The small but significant Race X Items 
Interaction could be reduced to nonsignif icance by equating the white and 



Negro groups for overall score, which would not be expected if the two 
groups differed culturally in reaction to the test items. Moreover, it 
was possible to produce a significant '^Ps eudo-Race'^ > Items interaction 
within the culturally homogeneous white group simply by dividing the totnl 
white sample into two groups, one which duplicates the mean and Sd_ of the 
Negro norms and the other which duplicates the mean and SD of the white 
norms* This suggests that the Race X Items interaction is really an 
ability level X items interaction rather than an interaction due to cul- 
tural differences. 

The only way one could view these findings as being not incomprit ible 
with the hypothesis that the Wonderl-l-c is a culturally biased test for 
Negroes would be to claim that culture bias depresses Negroes' performance 
on all the test items to much the same degree, which seems highly unlikely 
for cultural effects per se, and especially considering the great variety of 
item content in the Wonderlic. Otherwise it should be possible to make up 
subscales consisting of items on which the Negro group on the average does 
as well or better than the white group. This, however, is not possible with 
the present pool of Wonderlic items. The items that best measure the genera 
factor common to all items within each racial group are also the same, items 
that discriminate the most between the racial groups. 

The present analyses yield no consistent or strong evidence that the 
Wonderlic is reacted to in any way differently in the Negro and vjhite sam- 
ples, except in overall level of performance, in which the normative popu- 
lations differ by about, one. standard deviation. The present evidence lends 
no support to the hypothesis that the cause of this difference in average 
score on the Wonderlic is explainable in terms of cultural bias. 
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